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BOUNDS ON CHANGES IN RITZ VALUES FOR A PERTURBED 
INVARIANT SUBSPACE OF A HERMITIAN MATRIX* 

M. E. ARGENTATI tt f A. V. KNYAZEV t§ , C. C. PAIGE^, AND I. PANAYOTOVH 

Abstract. The Raylcigh-Ritz method is widely used for eigenvalue approximation. Given a 
matrix X with columns that form an orthonormal basis for a subspace X, and a Hermitian matrix 
A, the eigenvalues of X H AX are called Ritz values of A with respect to X. If the subspace X is 
A-invariant then the Ritz values are some of the eigenvalues of A. If the A-invariant subspace X 
is perturbed to give rise to another subspace y, then the vector of absolute values of changes in 
Ritz values of A represents the absolute eigenvalue approximation error using y. We bound the 
error in terms of principal angles between X and y. We capitalize on ideas from a recent paper 
[DOI: 10.1137/060649070] by A. Knyazev and M. Argentati, where the vector of absolute values of 
differences between Ritz values for subspaces X and y was weakly (sub-)majorized by a constant 
times the sine of the vector of principal angles between X and the constant being the spread of 
the spectrum of A. In that result no assumption was made on either subspace being A-invariant. 
It was conjectured there that if one of the trial subspaces is A-invariant then an analogous weak 
majorization bound should be much stronger as it should only involve terms of the order of sine 
squared. Here we confirm this conjecture. Specifically we prove that the absolute eigenvalue error 
is weakly majorized by a constant times the sine squared of the vector of principal angles between 
the subspaces X and y, where the constant is proportional to the spread of the spectrum of A. For 
many practical cases we show that the proportionality factor is simply one, and that this bound is 
sharp. For the general case we can only prove the result with a slightly larger constant, which we 
believe is artificial. 

Key words. Hermitian matrices, angles between subspaces, majorization, Lidskii's eigenvalue 
theorem, perturbation bounds, Ritz values, Raylcigh-Ritz method, invariant subspace. 

AMS subject classification. 15A18, 15A42, 15A57, 15A60. 

(Place for Digital Object Identifier, to get an idea of the final spacing.) 

1. Introduction. Eigenvalue problems appear in many applications. For exam- 
ple eigenvalues represent the frequencies of vibration in mechanical vibrations, while 
the energy levels of a system are the eigenvalues of the Hamiltonian in quantum me- 
chanics. Eigenvalue problems are used today in these and many other applications, 
including spectral data clustering and internet search engines. 

Eigenvalues cannot be computed exactly except in some trivial cases, so numerical 
approximation is required. Eigenvalue a posteriori and a priori error bounds describe 
the eigenvalue approximation quality, and this is a classical and important topic in 
matrix analysis. A posteriori bounds are based on information readily computable, 
e.g., the eigenvector residuals, and are necessary, e.g., for adaptive numerical meth- 
ods for eigenvalue approximation. A priori bounds are given in terms of theoretical 
properties, and can be very useful in assessing relative performance of algorithms. 
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The widely used Raylcigh-Ritz method is well known for its ability to generate 
high quality approximations to eigenvalues of Hcrmitian matrices. It is the basis for 
many numerical procedures for computing eigenvalues, such as finite element methods 
and the Lanczos cigcnproblcm iteration. Eigenvalue error bounds for the Raylcigh- 
Ritz method are important since they provide estimates and predictions of the quality 
of eigenvalue approximations, and can be used, e.g., to predict the number of iterations 
needed in the Lanczos method for computing some eigenvalues to within a given 
accuracy. There is a vast literature on Raylcigh-Ritz eigenvalue methods and error 
bounds, see, e.g., [16, Chapter 4], [19, Chapters 10-13], and [20, Chapters 3-5]. 

We contribute to this traditional area of research with a new twist — using weak 
majorization. Majorization is a classical technique that can be used to formulate and 
prove a great variety of inequalities in a concise and elegant way. It is widely used in 
matrix analysis, e.g., to bound perturbations of eigenvalues via Lidskii's beautiful the- 
orem [17]. In the context of Raylcigh-Ritz eigenvalue error bounds, weak majorization 
was introduced in the celebrated work of Davis and Kahan [3] to bound eigenvalue 
errors a posteriori. In the present paper we propose and prove what appear to be 
the first theorems based on weak majorization for a priori Rayleigh-Ritz eigenvalue 
error bounds. Our results provide a theoretical foundation that can be applied in a 
number of situations, e.g., for finite clement methods, e.g., [4], and for block Lanczos 
iterations such as in [5], see [14]. 

We use several well known majorization results found, e.g., in [1, 7, 18]. We give 
references throughout the paper for the concepts we introduce. For a more thorough 
background and reference list, see [13]. 

The rest of the paper is organized as follows. Section 2 contains all necessary 
definitions and basic facts on majorization that we need for our eigenvalue and sin- 
gular value bounds. Section 3 is the main part of the paper, where we motivate and 
formulate our conjectures and theorems. Section 4 has all our proofs. In section 5 we 
show that our main results are sharp; we also discuss our proofs, and the possibility 
that our bound for the most general case might be slightly improved. 

2. Definitions and Prerequisites. We introduce the definitions and tools we 
need, together with some mild motivation. We do not provide proofs for the results 
in this section — instead we refer the reader to some of the relevant literature. 

2.1. Notation. For a real vector x — [xi, . . . , x n ] T , we use = [x\, . . . , x\^\ T to 
denote x with its elements rearranged in descending order, while 
denotes x with its elements rearranged in ascending order. We use |cc| to denote the 
vector x with the absolute value of its components. We use the '<' symbol to compare 
real vectors component- wise. For real vectors x and y the expression x -< y means 
that x is majorized by y, while x -< w y means that x is weakly (sub-)majorized by y, 
see section 2.2. 

We consider the Euclidean space C" of column vectors equipped with the standard 
scalar product x H y and the norm ||x|| = V x H x. We use the same notation ||^4|| for 
the induced matrix norm of a complex matrix A E <C nxn . X = 1Z(X) C C™ means 
the subspace X is equal to the range of the matrix X with n rows. The unit matrix 
is / and the zero matrix (not necessarily square) is 0, while e = [1, . . . , 1] T . We use 
Ti{n) to denote the set of n x n Hermitian matrices and U{n) to denote the set of 
n x n unitary matrices in the set C nx ™ of all n x n complex matrices. 

We write X(A) = X^-(A) for the vector of eigenvalues of A e H(n) arranged in 
descending order, and we write s(B) = s^B) for the vector of singular values of B 
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arranged in descending order. Individual eigenvalues and singular values are denoted 
by Xi(A) and Si(B), respectively, so, e.g., spr(A) = Xi(A) — X n (A) and Si(B) = \\B\\. 

Let subspaccs X and y C C™ have the same dimension, with orthonormal bases 
given by the columns of the matrices X and Y respectively. We denote the vector 
of principal angles between X and y arranged in descending order by 6(X,y) = 
9 l (X,y), and define it using cosd{X,y) = s^(X H Y), e.g., [2], [6, §12.4.3]. 

2.2. Majorization and Weak Majorization. We now briefly define the con- 
cepts of majorization and weak majorization which are comparison relations between 
two real vectors. For detailed information we refer the reader to [1, 7, 18]. 

We say that x £ M. n is weakly (sub-)majorized by y <E R™, written x -< w y, if 

k k 

(2-1) I>i<I>i. l<k<n, 

i=l i=l 

while x is (strongly) majorized by y, written x -< y, if (2.1) holds together with 

n n 

(2-2) $>i = X>- 

i=l i=l 

Our final results in the paper are weak majorization bounds of the form x -< w y 
with x > 0. On the one hand, we can see from (2.1) that x < y =>• x -< w y, i.e., 
the inequality implies weak majorization. In our case the advantage of using weak 
majorization is that the inequality x < y (the values of x and y become apparent 
later) is simply wrong, while the weak majorization bound x -< w y does hold. On the 
other hand, a weak majorization bound x -< w y implies max(cc) < max(y). So if the 
bound max(i) < max(y) is already known, but it is also known that x < y does not 
hold, it makes sense to conjecture and to try to prove x -< w y . 

Strong '-<' and weak '-<„,' majorization relations share only some properties with 
the usual inequality '<' relation, so one should deal with them carefully. For example 

and '^m' arc reflexive and transitive, but x -< y and y -< x do not imply x = y; 
e.g., [1, Remark II. 1.2]. Similarly x < y docs not imply the intuitive x + z -< y + z, 
as is seen in the example x = (0,0,0), y = (2,-1,-1), z = (—2,0,0). So we must 
be particularly careful of the ordering when we combine results. Thus it can be seen 
from (2.1) and (2.2) that: x + u -< x l + u l , e.g., [1, Corollary II.4.3], and 

(2.3) {x -< w y} & {u -< w v} & ■ ■ ■ x + u-i ~(x l +u l -\ -< w y l +v 1 ^ , 

where this also holds with '-<«,' replaced by '-<'. 

Some of the other basic majorization and related results we use are fairly obvious: 

(2.4) A G H(n) =>■ \X(±A)\^ = s(A); 

(2.5) \x ± y\ -< w \x\ l + \y\ l , since from (2.3) \x ± y\ < \x\ + \y\ -< \x\ l + \y\ l ; 

(2.6) x H y => \x\ -< w \y\, see, e.g., [1, Example II. 3. 5]. 

Arithmetic operations, e.g., the sum and the product, on vectors used in ma- 
jorization are performed component- wise. In the subsequent Theorems 2.3 and 2.4 
for rectangular matrices we may need to operate with nonnegative vectors of differ- 
ent lengths. A standard agreement in this case is to add zeroes at the end of the 
shorter vector to match the sizes needed for component-wise arithmetic operations 
and comparisons. We also use this agreement in later proofs. 
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Many inequality relations between eigenvalues and singular values are succinctly 
expressed as majorization or weak majorization relations; and a beautiful example is 

THEOREM 2.1. (Lidskii [17], see also, e.g., [1, p. 69]). Let A and B G TL{n). The 
eigenvalues of A, B, and A — B satisfy X(A) — X(B) -i X(A — B). 

Recall here that \(A) - \{B) = X l (A) - X l (B). Note that the equivalent of (2.2) 
holds here using trace(A) = ^\ Xi(A). We will use the following corollary: 

Corollary 2.2. (E.g., [18, Chapter 9, G.l.d], [7, Corollary 3.4.3]). If A and 
B G C" xn then s(A±B) -< w s(A) + s(B). This corollary also follows from a weaker 
statement than Lidskii's theorem, e.g., [1, Exercises II. 1.14, II. 1.15]. 

By using (2.3) we can see that Corollary 2.2 extends to the case of three or more 
matrices, because all vectors s(A), s(B), . . . are nonincreasing. 

We also use results for the singular values of a product of matrices: 

Theorem 2.3. (E.g., [7, Theorem 3.3.14]). s(AB) -< w s(A)s(B) for arbitrary, 
possibly rectangular, matrices A and B such that AB exists. 

Theorem 2.4. (E.g., [7, Theorem 3.3.16], [1, Problem III.6.2]). 
s(AB) < \\A\\s(B) and s{AB) < \\B\\s (A) for arbitrary, possibly rectangular, matrices 
A and B such that AB exists. 

3. Motivation and Main Results. The Rayleigh-Ritz method for approximat- 
ing eigenvalues of a Hermitian matrix A finds the eigenvalues of X H AX, where the 
columns of the matrix X form an orthonormal basis for a subspace X. Here X is called 
a trial subspace. The eigenvalues of X H AX do not depend on the particular choice 
of basis and are called Ritz values of A with respect to X. If X is one-dimensional 
and spanned by the unit vector x there is only one Ritz value — namely the Rayleigh 
quotient x H Ax. 

When the trial subspace X is perturbed to become the subspace y, it is useful to 
know how the Ritz values of A vary. For one-dimensional X and y, spanned by unit 
vectors x and y respectively, the following result appears in, e.g., [12, Theorem 1]: 



Here and below 9(x,y) is the acute angle between the two unit vectors x and y defined 
by 9(x,y) = arccos|a; ff ?/| G [0, n/2]. 

It is well known that every eigenvector is a stationary point of the Rayleigh 
quotient (considered as a function of a vector) — i.e., in the vicinity of an eigenvector 
the Rayleigh quotient changes very slowly. The classic result that motivates this paper 
is the following: the Rayleigh quotient approximates an eigenvalue of a Hermitian 
matrix with accuracy proportional to the square of the eigenvector approximation 
error. The following simple bound, e.g., [12, Theorem 4], demonstrates this: 



where we assume that one of the unit vectors x or y is an eigenvector of A. To give 
a thorough background to our results we re-derive this important basic bound. Let 
Ax = xX, then x Ax = A so \x H Ax — y H Ay\ = \y H (A — XI)y\. We now plug in the 
orthogonal decomposition y = u + v where u G spanjx} and v G (spanja;})^ Thus 
(A-XI)u = and = sin9(x, y), which results in \y H {A- XI)y\ = \v H (A- XI)v\ < 
|| A - A/||||v|| 2 = || A- A/||sin 2 6l(x,?/). But \\A-XI\\< spr(A), giving (3.2). 

Let us now discuss some generalizations of (3.1) and (3.2) for subspaces X and y 
of dimensions higher than one, with dim A" = dim y. Let X and Y be two matrices 
whose columns form orthonormal bases for X and y respectively, and suppose that the 



(3.1) 



\x Ax - y Ay\ < spr(A) sin6»(x, y). 



(3.2) 



\x H Ax - y H Ay\ < spr(A) sin 2 9(x, y), 
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Ritz values of A with respect to X and y are arranged in descending (more precisely 
"nonincreasing" ) order. To generalize (3.1) and (3.2) we replace the usual notion of 
angles between vectors by a more general one of principal angles between subspaces, 
and replace the inequality symbol by the weak (sub-)majorization symbol '-<tu'- 

Let A (A) denote the vector of descending eigenvalues Xi(A) of a Hermitian matrix 
A, s(B) the vector of descending singular values of a matrix B, and 0(X, y) the vector 
of descending principal angles 9i(X,y) between the subspaces X and y, defined such 
that the vectors cos8(X ,y) and s{X H Y) are the same, except for the reversed order, 
see, e.g., [2], [6, Section 12.4.3]. A recent paper [13] generalizes (3.1) to: 

(3.3) \X(X H AX) - X(Y H AY)\ -< w spr (A) sin 6(X,y). 

The weak majorization bound (3.3) implies, e.g., a bound for its largest term: 

(3.4) m^x\X l (X H AX) - Xi(Y H AY)\ < spr(A) gap(#, y), 

i 

where g&p{X,y) = max i {sin0 i (A', y)} in this case, e.g., [11, 13]. 

Both bounds (3.3) and (3.4) generalize (3.1) to multidimensional subspaces, but 
no assumption of A-invariance is made in either case. What is the bound that gener- 
alizes (3.2), assuming that one of the subspaces X or y is ^4-invariant? A natural con- 
jecture, made in [13], is that such a bound could be obtained in terms of sin 2 6(X,y). 
No majorization result of this kind is known, but simpler results — for the largest error 
only — are available; e.g., the following important bound is proved in [9], reproduced 
in [4, Theorem 2, p. 477], and [15, Theorem 2.4], with a different proof suggested in 
[8, Theorem 2.2.3, p. 56]; for an English translation of the latter see [10, Theorem 2.3, 
p. 383]. We present here a slightly modified formulation to make it consistent with 

(3.4) : if X or y is A-invariant and corresponds to a contiguous set of the extreme, 
i.e., largest or smallest, eigenvalues of A, then 

(3.5) max\Xi(X H AX) - Xi(Y H AY)\ < spr(A) gap 2 (#, y). 

i 

Bound (3.5) generalizes (3.2), but does not take advantage of majorization. Com- 
paring (3.3) and (3.4) with (3.1), and (3.5) with (3.2), we make an educated guess 
for the general case where the invariant subspace is not necessarily associated with a 
contiguous set of extreme eigenvalues: 

Conjecture 3.1. Let the subspaces X and y have the same dimension, with 
orthonormal bases given by the columns of the matrices X and Y respectively. Let the 
matrix A be Hermitian, and X or y be A-invariant. Then 

(3.6) \X(X H AX) - X(Y H AY)\ < w spr(A) sin 2 9(X, y). 

We emphasize that the bound (3.6) involves the sine squared and, since conver- 
gence analyses are of particular interest for small angles, this is a great improvement 
over (3.3). This is just as we would hope, since one of the subspaces is ^-invariant in 
(3.6). The exact A-invariance assumption is equivalent to the subspace being spanned 
by some exact eigenvectors of A, and Conjecture 3.1 is an a priori Rayleigh-Ritz eigen- 
value error bound which can be used to examine how the subspaces y of an iterative 
eigenproblem algorithm approach an ideal ^4-invariant subspace X. As we mentioned 
in the introduction, eigenvalue error bounds are important in many applications. We 
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refer the reader to the follow-up paper [14] where we extend some results of this pa- 
per to Hilbert spaces, and discuss in detail applications to finite element methods and 
subspace iterations. 

The implications of the weak majorization inequality (3.6) in Conjecture 3.1 may 
not be obvious to every reader. The weak majorization bound (3.6) directly implies 



J2 \K{X H AX) - K{Y H AY)\t < spr(A) £ S in 2 (^(<Y, y))\ j = l,...,k, 

i=l t=l 

see (2.1), where k = dim X = dim y. For example for j = k we obtain 

k k 

\K(X H AX) - Xi(Y H AY)\ < spr(A)^sin 2 (0 i (A',y)), 

i=l i=l 

and for j = 1 we get (3.5). Moreover, for real vectors x and y the weak majorization 
x < w y is equivalent to the inequality Yli=i ^( x i) ^ Sj=i fiiVi) holding for any 
continuous nondecreasing convex real valued function cf>, see, e.g., [18, Statement 
4.B.2]. If for example we take <p(t) = t p with p > 1, the bound (3.6) also implies 

(k \p I k \p 

J2\X l (X H AX)-X l (Y H AY)\P\ <spr(^) f^sin 2 f(^(Af,y))J . 

We have not proven that Conjecture 3.1 holds in all circumstances, and indeed it 
might not (but we suspect it does). But we have proven it always holds if we multiply 
the bound by 1.5. In section 4 we also show that Conjecture 3.1 does hold in some 
very useful circumstances: 

Theorem 3.1. The bound (3.6) of Conjecture 3.1 holds if, in addition to the 
assumptions of Conjecture 3.1, either or both of the following conditions hold: 

(a) The A-invariant subspace X or y corresponds to a contiguous set of the largest 

(or smallest) eigenvalues of A. 

(b) All the eigenvalues of A corresponding to the A-invariant subspace X or y lie 

between (and possibly including) one extreme eigenvalue of A and the midpoint 
[\\(A)+\ n (A)]/2 of A's spectrum. 
This does not cover all known cases where (3.6) holds, but it does cover many 
practical cases. For example in approximating the eigenvalues of a Hcrmitian matrix, 
perhaps using Lanczos' eigenvalue algorithm, e.g., [6, §9], we are often interested in 
just one end of the spectrum. In section 4 we also show a weaker result always holds: 
Theorem 3.2. Under the assumptions of Conjecture 3.1 we have 



(3.7) \X{X H AX) - X(Y H AY)\ ~< w spr(A) 



e - cos 9{X, y) + - sin 2 9(X, y) 



Here and below we use 'e' to indicate a vector of ones. Note that the individual 
elements for both vectors e — cos 9(X,y) and sin 2 6»(A\;y) are decreasing, since both 
functions 1 — cos# and sin 2 8 are monotonically increasing within [0, 7r/2], and the 
vector 9(X,y) is chosen to be decreasing. We now deduce two simple corollaries of 
Theorem 3.2. Using elementary trigonometry, for 8 E [0,7r/2]: 

2 - 2cos6» = 2 - 2cos0 - (1 - cosfl) 2 + (1 - cos<9) 2 

= sin 2 6+ (1 - cos 9) 2 = sin 2 9 + sin 4 9/(1 + cos Of 
< sin 2 8 + sin 4 8. 
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Wc first conclude that bound (3.7) is slightly worse than bound (3.6) from Conjecture 
3.1; and second, we immediately obtain from (3.7): 

Corollary 3.3. Under the assumptions of Conjecture 3.1, we have 



(3.8) 
(3.9) 



\\{X H AX) - X(Y H AY)\ < w spr(A) 



sin 2 9(X,y) + isin 4 ^,^) 



< -spr(A)sin 2 0(A\;y). 



Extending the above trigonometric relation we see that 

^ 2a x 2sin 2 6> 



2 - 2 cos 6 = sin 2 



1 



sin 



(1 + cos* 



sin 



1 + cos 9 cos 2 (6>/2) 



< tan 2 6, 



for 9 e [0, 7r/2]; and with sin 2 9 < tan 2 9, bound (3.7) implies another corollary: 
COROLLARY 3.4. Under the assumptions of Conjecture 3.1, we have 



(3.10) 



\\(X H AX) - \(Y H AY)\ -<„ sw{A)t&n 2 9(X,y). 



We give an example in section 5 demonstrating that the conjectured bound (3.6) 
cannot be any tighter. Our numerical tests suggest that Conjecture 3.1 holds, i.e., 
that bound (3.7) can probably be improved to (3.6). However we show in section 5 
that already the first step in our proof of Theorem 3.2 does not allow us to prove 
the better bound (3.6), so a completely different approach is apparently needed to 
support Conjecture 3.1 in all cases — see section 5 for more thoughts on this. 

Conjecture 3.1 turns out to be easy to formulate, but hard to prove in its gener- 
ality. We believe that the present publication, which proves Conjecture 3.1 in several 
practically interesting particular cases and provides slightly weaker bounds (3.7)- 
(3.10) for the general case, is important since it serves as a theoretical foundation 
for our future work on applications, e.g., [14]. It is also novel — we know of no other 
case where majorization is used for a priori Raylcigh-Ritz error bounds. The only 
somewhat related result known to us is the pioneering work of [3], where majorization 
is applied to bound eigenvalue errors a posteriori. 

4. Proofs. We have all the tools needed to prove our main results Theorem 3.1 
and Theorem 3.2. At first both proofs develop along the same lines; later they split. 

By the assumptions in the theorems, X and y are two subspaccs of C™ of the 
same dimension k, and are the column ranges of matrices X and Y with orthonormal 
columns that are arbitrary up to unitary transformations of their columns. Using 
the singular value decomposition we choose such a pair of matrices X and Y with 
orthonormal columns so that C = X H Y is real, square and diagonal, with the diagonal 
entries in increasing order. Thus by the definition of angles between subspaces, 

(4.1) C = diag(s T (X ff Y")) = diag (cos 6(X, y)) . 

We arbitrarily complete X and Y to unitary matrices [X, X±], and [Y, Y±] € U(n) and 
consider the 2x2 partition of their unitary product [X, X±] H [Y, Y±). By construction 
of X and Y, its k x k upper left block is C. We denote its {n— k) x k lower left block by 
S = (X±) H Y. Since [X, X±] H [Y, Y±] is unitary, the entries C and S of its first block 
column satisfy C 2 +S H S = I. So \{S H S) = X(I-C 2 ) = e-cos 2 9(X, y) = sin 2 9(X, y), 
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where e is the vector of ones, and so the vectors of singular values s(C) and s(S) are 
closely connected and we derive from this that 



(4.2) 



eHne(X,y) = [s(5),0,...,0], 



where max{2fc — n,0} zeroes are added on the right-hand side to match the number 
k of angles in the vector 9(X,y) with the number min{fc,n— k} of singular values in 
the vector s(S). 

Both theorems assume that either X or y is A-invariant, so without loss of gen- 
erality let X be A-invariant. Then since [X, X±] is unitary: 

[X^X^A [X,Xj=diag(A 1 i,A 22 ), and ,4 = [X,X A _]dmg{A 11 ,A 22 )[X,X 1 _] H . 

Here X H AX = An e H(k) and (X±) H AX± = A 22 S H{n-k). We can now use 
Y H [X,X ± ] = [C H , S H ] = [C, S H ] to show that 

(4.3) Y H AY = Y H ([X, X ± ] <tiag(A n , A 22 )[X, X^f) Y = CA n C + S H A 22 S. 

The expression we want to bound in Theorems 3.1 and 3.2 now takes the form 

\(X H AX) - \{Y H AY) = A(Au) - X(CA n C + S H A 22 S) 

= A( J 4u)-A(CAiiC) + X(CAnC) - X(CA n C + S H A 22 S) 

(4.4) -< [\(A 11 )-\(CA 11 C)]l + X(-S H A 22 S), 

where this last line used Lidskii's Theorem 2.1 with (2.3). See the discussion following 
(5.1) for more about this choice. Next (2.4), Theorems 2.3 and 2.4, and (4.2) give 



(4.5) 



\X(-S H A 22 S)\ l =s(S H A 22 S) < w \\A 22 \\sm 2 6(X,y). 



At this point the proofs split. Each proof will use a different majorization of 
A(An) - X(CA U C) in (4.4), but both will use (4.5). We first establish Theorem 3.1. 
Neither (3.6) nor (3.7) is altered by replacing A by ±A+aI where a is an arbitrary 
real constant, and so we can make the new An nonnegative definite in each of the 
parts (a) and (b) of Theorem 3.1 by choosing the appropriate sign and the shift a. 

Proof, [of Theorem 3.1] The starting point of the proof is (4.4), but now we 
assume An is nonnegative definite and so has a nonnegative definite square root 
\f A\\ . We deal with X(An) — X(CAnC) first. For arbitrary square matrices F 
and G we have X(FG) = X(GF). Taking F = Cy/A^ and G = VA^C, we get 
X(CAnC) — A(V ' A\\C 2 \/ ' A\\) . Using this and Lidskii's Theorem 2.1 we see that 



A(A n ) - X(CA n C) = X(A n ) - X (V^iT^v^iT) 



-< A 



'An J An 



'AnC^JAn 



= x(^ATi(l-C 2 ) y/A^j =x(^A n ~S H S^A n ~) , 
since C 2 + S H S = I. Then using (2.6) with Theorem 2.4 (twice), and (4.2) we obtain 
\X(An) - X(CA n C)\ < w s(yATxS H S^A^) < \\A n \\ sin 2 0(X,y). 
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Apply (2.6) to (4.4); then (2.5), (2.3), and (4.5) with the above bound give 

\X{X H AX)-X(Y H AY)\ -< w \[X(A n )-X{CA n C)} i +X(-S H A 22 S)\ 

< w |A(A U ) - X{CAnC)\ l + \X(-S H A 22 S)\ l 

(4.6) < w {\\A n \\ + \\A 22 1|) sin 2 6(X,y). 

Here this proof splits, and we first prove part (a) of Theorem 3.1. By assumption 
the invariant subspace X corresponds to a contiguous set of the largest (or smallest) 
eigenvalues of A. Here we present the proof for the case of the largest eigenvalues. 
The case of the smallest eigenvalues follows immediately by substituting —A for A. 
We replace A with A + al where a is chosen as the constant real shift that makes 
the new An positive semidefinite (nonnegative definite and singular), so that s/An 
exists. Since dim A" = k, and the invariant subspace X corresponds to a contiguous 
set of the largest eigenvalues of A, a = — \k(A). After the shift Afc(^4) becomes zero, 
the eigenvalues of the block An become nonnegative with Ai(-A) being the largest 
in absolute value, while the eigenvalues of the block A 22 become nonpositive with 
||A 22 || = -X n (A). Thus ||An|| + p 22 || = Ai(A)-A„(A) = spr(A). Using this together 
with (4.6) gives (3.6), completing the proof of part (a). 

For part (b) of Theorem 3.1 we prove the case where the eigenvalues of An lie 
in the top half of the spectrum of A, the remaining case is proven by substituting 
—A for A. Choose the shift so that for the new A, \\{A) = — X n (A), ensuring with 
the assumptions that An is nonnegative definite and that ||^4ii|| < spr(A)/2 and 
\\A 22 \\ < spr(A)/2, so that (4.6) again leads to (3.6). □ 

In fact whenever we can choose the sign and shift in ±A + al so that this new A 
has An nonnegative definite with ||An|| + || J 4 22 || < spr(A), then (3.6) will be satisfied. 

We return again to (4.4) and (4.5) to establish Theorem 3.2. 

Proof, [of Theorem 3.2] Applying Lidskii's Theorem 2.1 with (2.3) to (4.4) gives 

X{X H AX) - X(Y H AY) -< [X(A n ) - A(CA n C*)] i + X(-S H A 22 S) 

(4.7) -< X(A U -CAnC) + X{-S H A 22 S). 

In order to bound this we will use the identity 

(4.8) An - CAnC = [I-C)A n + CA n (I-C), 

together with the following results obtained using (4.1) with Theorems 2.4 and 2.3: 

(4.9) s((I-C)An) < \\A n \\s(I-C) = \\A n \\(e - cos 0(X,y)), 
S (CA n (I-C)) -< w s(C)s(An(I - C)) < s(A n (I - C)) 

(4.10) < \\An\W-C) = H^ulKe-cosfl^.y)). 

Discarding the first C in s(CAn(I— C)) is no real loss, see section 5. Using (4.8) and 
applying (2.4), Corollary 2.2, and (2.3) with (4.9) and (4.10), gives 

\X{An - CA n C)\ l = s((I - C)A n + CA n {I - C)) 

< w s((I - C)An) + s(CA n (I - C)) 

(4.11) < w 2\\An\\{e- cos 9{X,y)). 
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Now apply (2.6) to (4.7), followed by (2.5), and use (4.11) and (4.5) with (2.3), 
together with ||An||, || ^4.22 II < ll-^lli to obtain: 



(4.12) 



\X(X H AX)-X(Y H AY)\ < w \X(A n -CA n C) + X(-S H A 22 S)\ 

^ w \X(A n -CA n C)\ l + \X(-S H A 22 S)\ l 
-< w || A|| [2(e - cos 9 (X,y)) + sin 2 0(X,y)] . 



Our final step is to replace ||A|| by an expression involving spr(A). Observe here 
that the difference between Ritz values is invariant under any shift a G K. So we shift 
A in a way to minimize ||vl||. This situation occurs when is exactly in the middle of 
the spectrum, in which case ||A|| = spr(A)/2. Combining this observation with (4.12) 
completes the proof of (3.7). □ 

5. Discussion. The following example shows that the conjectured bound (3.6) 
cannot be improved as a general result. Let n = 2m and let an arbitrary set of m an- 
gles 0i be given, where ir/2 >6\>...> 9 m > 0. Let C = diag(cos(6*i), . . . , cos(# m )), 



X = [I,0] H , Y = [C, Vl-C 2 ^, and A = 

are of size to, so that X and Y are n x m and A is n X n. Then the 9i become 
the principal angles between the pair of k — m dimensional subspaces X = 1Z(X) 
and y = TZ(Y). Moreover the Ritz values are the eigenvalues of X H AX = I and 

Y H AY = 2C 2 - I, and so \X(X H AX) - X(Y H AY)\ l = 2 sin 2 6{X, y). In this exam- 
ple spr(A) = 1 — (—1) = 2, so (3.6) turns into an equality. 

Asymptotically where all of the angles are small, bounds (3.6), (3.7), (3.8) and 
(3.10) are all equivalent. Moreover our numerical tests support Conjecture 3.1 in 
all cases. Perhaps in practical terms, from the point of view of a numerical analyst 
we are done. However it would be pleasing to know whether Conjecture 3.1 holds 
theoretically in its generality, since bound (3.6) looks more aesthetic and cannot be 
improved as a general result. 

One important thing we know is that our approach of starting with Theorem 2.1 
to deduce (4.4) (used in the proof of (3.7)) cannot reduce bound (3.7) to bound (3.6) 
in general, no matter how we modify the rest of the proof. This can be seen from the 
following example in C 4 . Let A = diag(An, A 22 ) and C = X H Y, S = X^Y be as in 







where all unit matrices / 



(5.1) 





' 


1 





" 


A = 


1 

















1 
















1 



, [X,X ± ]=h, [Y,Y ± } = 



" 





-1 


" 





1 








1 




















1 



where I4 is the 4x4 unit matrix, so that X, X±, Y, Y± all have two columns. Then 
[X, X±] H [y, Y±] = [Y,Y±] are chosen as in our proofs and we see that 8(X,y) = 
[7r/2, 0] T , CAuC = 0, S H A 22 S = diag(l, 0). Here the largest and smallest eigenvalues 
of A are ±1, so spr(A) = 2. Hence by direct calculation 



X H AX 



A n 



1 

1 



Y H AY = CA n C + S H A 22 S = S H A 22 S 



1 




\X{X H AX) - X{Y H AY)\ 



1 




1 











c 1 


-1 











1 








spr(A) sin 2 6»(;r,;y), 
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so example (5.1) does satisfy (3.6). 

Let us now attempt to use (4.4) for (5.1). The right hand side of (4.4) is 



a = [X{A n ) - X(CA n C)} 1 + \{-S H A 22 S) = 



where it is not true that \a\ -< w spr(A) sin 2 9(X, y). That is, the absolute value of 
the right-hand side of (4.4) is not always weakly majorized by spr(^4) sin 2 9(X,y), so 
we cannot obtain a general proof of (3.6) starting from the majorization in (4.4). 

Example (5.1) can tell us even more. For any matrix M = M H we have the 
following generalization of (4.7) (M = CA n C in (4.4) and (4.7)): 

X(X H AX) - \{Y H AY) = \{X H AX) - X(M) + X(M) - X(Y H AY) 
~< X(X H AX - M) + X(M - Y H AY) = a. 

It might be thought that if, e.g., X H AX is indefinite, some such M could be chosen 
to minimize a and to prove (3.6). But in example (5.1) it can be shown that there is 
no real symmetric M giving a satisfying the desired bound \a\ -< w spi(A) sin 2 9(X 1 y) . 
In particular M = Y H AY will not give this bound, as the reader can check via (5.1). 
That is, using X{X H AX) - X(Y H AY) -< X(A n - CAnC- S H A 22 S) in place of (4.4) 
will still not give (3.6) via our approach. 

So on the one hand we cannot improve bound (3.7) to give (3.6) except possibly 
by considering a different approach to our present way of using Lidskii's Theorem 2.1 
or equivalent in the first step, see (4.4) and (4.7). On the other hand our numerical 
tests suggest that the tighter bound (3.6) holds. Thus if we are to prove (3.6) for 
widely spread interior eigenvalues, we appear to need an approach more sophisticated 
than our particular application of Lidskii's theorem in the first step. 

An essentially equivalent first step was used in [12, Theorem 10] in an earlier 
attempt to prove (3.3), where it led to an artificial multiplier v2 in the right-hand 
side of (3.3). The subsequent paper [13] used an unusual technique to extend an 
arbitrary Hermitian operator to an orthogonal projector in a higher dimensional space, 
preserving its Ritz values, to prove (3.3) as it is stated, without the multiplier \/2. 
Perhaps the same technique might shed light here, and help us to establish Conjecture 
3.1, but this currently remains an open question. 

Conclusions. We clarify a conjecture of Knyazev and Argentati [13] on a bound 
for the absolute difference between Ritz values of a Hermitian matrix A for two trial 
subspaces, one of which is A-invariant. We prove the conjecture for the cases where 
(a) : the A- invariant subspace corresponds to a contiguous set of the largest (or small- 
est) eigenvalues of A, and (b): the eigenvalues of A corresponding to the ^-invariant 
subspace all lie in the top (or the bottom) half of the spectrum of A. We prove a 
slightly weaker bound for general invariant subspaces. We believe that the conjecture 
holds, i.e., that this weaker bound can be improved, and this is supported by our 
numerical tests, but the proof of the conjecture in its generality (if it is true) may 
require an unorthodox approach, perhaps one such as that used in [13]. These results 
are useful in practice, and for example are applicable to the analysis of routines which 
use the Rayleigh-Ritz method, such as some Krylov subspace methods. We refer the 
reader to the subsequent paper [14], where we extend some results of this paper to 
Hilbcrt spaces, and discuss in detail their application to finite element methods and 
subspace iterations. 
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